52 research outputs found

    Stata Now Available

    Get PDF

    Block-based execution on an integrated vector-scalar in-order core

    Get PDF
    In the low-end processor mobile market, power, energy and area budgets are significantly lower than in the server/desktop/lap-top/high-end mobile markets. It has been shown that vector processors are a highly energy-efficient way to increase performance but adding support for them incurs area and power overheads that could not be acceptable for low-end mobile processors. In this work, we propose an integrated vector-scalar design that mostly reuses scalar hardware to support the execution of vector instructions. The key element of the design is our proposed block-based model of execution that groups vector instructions to execute them in a coordinated manner

    Rapid Evaluation of Requirements for Vector Micro-Architectures

    Get PDF
    English: Power consumption has become one of the dominant issues in processor design, especially important in embedded systems and data centers. One of possible solution that can address this issue and provide higher performance for existing applications and new capabilities for future applications used in hand-held devices and data centers is to use vector processor. This thesis presents the design and implementation of a vector library that enables the vectorization of the target applications and allows to characterize them. We also present the ETModel: a simple trace-driven simulator for vector processors. It is used to analyse the micro-architectural requirements of the vectorized applications. We show that the target applications are highly vectorizable with a degree of vectorization from 62.9% for H264ref to 91% for ECLAT. Detailed instruction level characteristics such as the distribution of vector instructions, the distribution of vector lengths, etc. are also presented in the thesis. The thesis contains detailed timing analysis of the vectorized applications for di erent micro-architectural con gurations of a vector processor. We measured the execution time for the di erent con gurations of cache hierarchy, main memory latencies, maximum vector lengths and con guration of functional units, as well as the usage of functional units. All these help in understanding the behavior of the vectorized applications and requirements of vector micro-architecture

    Vector processing-aware advanced clock-gating techniques for low-power fused multiply-add

    Get PDF
    The need for power efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a retailoring for the mobile market that they are entering now. Floating-point (FP) fused multiply-add (FMA), being a functional unit with high power consumption, deserves special attention. Although clock gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector FMA units (VFUs). These techniques ensure power savings without jeopardizing the timing. We evaluate the proposed techniques using both synthetic and “real-world” application-based benchmarking. Using vector masking and vector multilane-aware clock gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector FP instructions. Finally, when evaluating all techniques together, using “real-world” benchmarking, the power reductions are up to 80%. Additionally, in accordance with processor design trends, we perform this research in a fully parameterizable and automated fashion.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA 321253 and is supported in part by the European Union (FEDER funds) under contract TTIN2015-65316-P. The work of I. Ratkovic was supported by a FPU research grant from the Spanish MECD.Peer ReviewedPostprint (author's final draft

    Evaluation of vectorization potential of Graph500 on Intel's Xeon Phi

    Get PDF
    Graph500 is a data intensive application for high performance computing and it is an increasingly important workload because graphs are a core part of most analytic applications. So far there is no work that examines if Graph500 is suitable for vectorization mostly due a lack of vector memory instructions for irregular memory accesses. The Xeon Phi is a massively parallel processor recently released by Intel with new features such as a wide 512-bit vector unit and vector scatter/gather instructions. Thus, the Xeon Phi allows for more efficient parallelization of Graph500 that is combined with vectorization. In this paper we vectorize Graph500 and analyze the impact of vectorization and prefetching on the Xeon Phi. We also show that the combination of parallelization, vectorization and prefetching yields a speedup of 27% over a parallel version with prefetching that does not leverage the vector capabilities of the Xeon Phi.The research leading to these results has received funding from the European Research Council under the European Unions 7th FP (FP/2007- 2013) / ERC GA n. 321253. It has been partially funded by the Spanish Government (TIN2012-34557)Peer ReviewedPostprint (published version

    Spirulina Phycobiliproteins as Food Components and Complements

    Get PDF
    Spirulina has a documented history of use as a food for more than 1000 years, and has been in production as a dietary supplement for 40 years. Among many of Spirulina bioactive components, blue protein C-phycocyanin and its linear tetrapyrrole chromophore phycocyanobilin occupy a special place due to broad possibilities for application in various areas of food technology. The subject of this chapter is up-to-date food applications of these Spirulina components, with a focus on their use as food colorants, additives, nutriceuticals, and dietary supplements. Their other actual and future food application possibilities will also be briefly presented and discussed

    POSTER: An Integrated Vector-Scalar Design on an In-order ARM Core

    Get PDF
    In the low-end mobile processor market, power, energy and area budgets are significantly lower than in other markets (e.g. servers or high-end mobile markets). It has been shown that vector processors are a highly energy-efficient way to increase performance; however adding support for them incurs area and power overheads that would not be acceptable for low-end mobile processors. In this work, we propose an integrated vector-scalar design for the ARM architecture that mostly reuses scalar hardware to support the execution of vector instructions. The key element of the design is our proposed block-based model of execution that groups vector computational instructions together to execute them in a coordinated manner.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA no 321253 and is supported in part by the European Union (FEDER funds) under contract TIN2015-65316-P. This research has been also supported the Agency for Management of University and Research Grants (AGAUR - FI-DGR 2014).Peer ReviewedPostprint (author's final draft

    GenotoksiÄŤni efekat metanolskog ekstrakta biljke Cotinus Coggygria Scop. kod Drosophila Melanogaster

    Get PDF
    Plant extracts that appear to have favorable properties, may contain chemical compounds with mutagenic, teratogenic and/or carcinogenic activity, and it is of great importance to the inclusion of genotoxic approaches to toxicological evaluation of plant extracts. Using a comet assay on eukaryotic model organism Drosophila melanogaster in in vivo condition, potential genotoxic activity of the methanol extract of plant Cotinus coggygria Scop. was determined. Treatment with the methanol extracts, at a concentration of 1%, caused no significant changes compared to the negative control. Based on the distribution of comet class and selected quantitative parameters (% DNA in tail and tail length) it can be concluded that a methanol extract obtained from C. coggygria at a concentration of 1% does not shows genotoxic activity.Uključivanje genotoksičnog pristupa u toksikološku evaluaciju biljnih ekstrakata, koji i pored povoljnih svojstava mogu da sadrže komponente sa mutagenim, teratogenim i/ili kancerogenim aktivnostima, je od velike važnosti. Primenom Komet testa kod eukariotskog model organizma Drosophila melanogaster u in vivo uslovima ispitivana je genotoksična aktivnost metanolskog ekstrakta biljke Cotinus coggygria Scop. Тretman sa ekstraktom u koncentraciji od 1% nije uzrokovao statistički značajne promene u odnosu na negativnu kontrolu. Na osnovu raspodela komet klasa i odabranih kvantitativnih parametara može se zaključiti da ekstrakt biljke C. coggygria ne pokazuje genotoksičnu aktivnost

    An integrated vector-scalar design on an in-order ARM core

    Get PDF
    In the low-end mobile processor market, power, energy, and area budgets are significantly lower than in the server/desktop/laptop/high-end mobile markets. It has been shown that vector processors are a highly energy-efficient way to increase performance; however, adding support for them incurs area and power overheads that would not be acceptable for low-end mobile processors. In this work, we propose an integrated vector-scalar design for the ARM architecture that mostly reuses scalar hardware to support the execution of vector instructions. The key element of the design is our proposed block-based model of execution that groups vector computational instructions together to execute them in a coordinated manner. We implemented a classic vector unit and compare its results against our integrated design. Our integrated design improves the performance (more than 6Ă—) and energy consumption (up to 5Ă—) of a scalar in-order core with negligible area overhead (only 4.7% when using a vector register with 32 elements). In contrast, the area overhead of the classic vector unit can be significant (around 44%) if a dedicated vector floating-point unit is incorporated. Our block-based vector execution outperforms the classic vector unit for all kernels with floating-point data and also consumes less energy. We also complement the integrated design with three energy/performance-efficient techniques that further reduce power and increase performance. The first proposal covers the design and implementation of chaining logic that is optimized to work with the cache hierarchy through vector memory instructions, the second proposal reduces the number of reads/writes from/to the vector register file, and the third idea optimizes complex memory access patterns with the memory shape instruction and unified indexed vector load.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA no 321253 and is supported in part by the European Union (FEDER funds) under contract TIN2015-65316-P. This research has been also supported the Agency for Management of University and Research Grants (AGAUR - FI-DGR 2014). O. Palomar is funded by a Royal Society Newton International Fellowship.Peer ReviewedPostprint (author's final draft
    • …
    corecore